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Advances in the field of sequencing techniques have resulted in the greatly accelerated 
production of huge sequence datasets. This presents immediate challenges in database 
maintenance at datacenters. It provides additional computational challenges in data min- 
ing and sequence analysis. Together these represent a significant overburden on traditional 
stand-alone computer resources, and to reach effective conclusions quickly and efficiently, 
the virtualization of the resources and computation on a pay-as-you-go concept (together 
termed "cloud computing") has recently appeared. The collective resources of the data- 
center, including both hardware and software, can be available publicly, being then termed 
a public cloud, the resources being provided in a virtual mode to the clients who pay 
according to the resources they employ. Examples of public companies providing these 
resources include Amazon, Google, and Joyent.The computational workload is shifted to 
the provider, which also implements required hardware and software upgrades over time. 
A virtual environment is created in the cloud corresponding to the computational and data 
storage needs of the user via the internet. The task is then performed, the results trans- 
mitted to the user, and the environment finally deleted after all tasks are completed. In 
this discussion, we focus on the basics of cloud computing, and go on to analyze the 
prerequisites and overall working of clouds. Finally, the applications of cloud computing 
in biological systems, particularly in comparative genomics, genome informatics, and SNP 
detection are discussed with reference to traditional workflows. 
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INTRODUCTION 

The accumulation of DNA sequence information, comprising 
merely the order within a simple polymer of the four canoni- 
cal bases (A, T, G, C), has suddenly exploded into the bioscientific 
universe, drawing comparisons to the Big Bang theory of the origin 
of the universe. The development of increasingly high-throughput 
sequencing techniques has revolutionized the DNA world, produc- 
ing extensive DNA datasets housed at various locations around the 
world. Collectively, this information has taken the form of cloud, 
and may be accurately termed a "DNA cloud." The explosion 
in DNA sequence accumulation can be traced to developments 
including pyrosequencing (Franca et al, 2002), nanopore sequenc- 
ing (Branton et al., 2008; Ivanov et al, 2011), single molecule 
sequencing (SMS) technology using DNA polymerases (Nusbaum, 
2009), non-optical sequencing based on detection of pH changes 
(Rothberg et al., 2011), and high-throughput short- read platforms 
such as the Illumina Miseq and Hiseq sequencers (Caporaso et al, 
2012). Despite the development of computers consistently follow- 
ing Moore's law in terms of processing speed, this aspect now 
lags the data storage and maintenance requirements for the large 
amount of DNA sequence data produced by high-throughput 
next-generation sequencing techniques (Shendure and Ji, 2008). 
To cope with this situation, techniques of parallel computation 
using virtual hardware, software, and working platform resources 
have appeared and collectively termed "cloud computing." 



Although the term cloud has been loosely employed as a 
metaphor for the internet, today the scenario has changed 
leading to a change in the definition of cloud computing 
which hearkens back to an earlier definition provided by John 
McCarthy in 1961, who said, "computation may someday be 
organized as public utility" (Speech given to MIT Centennial; 
Garfinkel, 1999). Douglas Parkhill, in his book entitled "The 
challenge of the computer utility" describes all the modern- 
day characteristics of cloud computing, which includes pro- 
viding an elastic environment as a utility, which can be used 
in several forms by the public, private, and community with 
differing usage. Although controversy persists over how to 
properly define a cloud, Forrester's definition (Harris, 2009) 
appears most appropriate: a cloud is a pool of abstracted, 
highly scalable, and managed computer infrastructure capable 
of hosting end-customer applications and billed by consump- 
tion. 

In a broader sense, cloud computing can be defined as an 
elastic execution environment of resources, involving multiple 
stakeholders and providing a metered service at multiple gran- 
ularities for a specified level of quality of service (Vermesan 
and Friess, 2011). Given that cloud computing is now emerg- 
ing as a commercial reality, the following points appear to 
underpin the reason for its commercialization (Armbrust et al, 
2009): 
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1. Data amount: Due to recent advancements in analytical 
techniques, huge amounts of data are generated per day in 
different fields of education and research. To maintain such 
large data sets and provide for their efficient processing, the 
concept of cloud computing offers practical advantages. 

2. Technology support: The development of simple and secure 
online payment modes, the increases in internet and net- 
work speeds accompanying introduction of 2G and 3G services 
and advancements in data compression techniques, all favor 
implementation of cloud computing. 

3. Cost: Advances in miniaturization technology have continu- 
ously reduced the overall cost of cloud computing, as compared 
to earlier technologies (such as Grid Computing, Distrib- 
uted system, and Utility Computing) which previously were 
employed in large datacenters. Presently, Intel is designing 
a Single Chip Cloud Computer which integrates a cloud of 
computers into an integrated chip (Harris, 2009). 

COMPONENTS OF COMPUTATIONAL CLOUDS AND THEIR 
STRUCTURE 

Historically, the definition of computational clouds has not been 
fixed, but has changed to accommodate developments in hardware 
and software; we fully expect its definition to change in adapt- 
ing to future developments. At the present time, computational 
clouds comprise the application(s) used to extract information 
from raw data, the database storing all the information, and the 
physical storage system and servers. Computational clouds are 
configured to provide services to end-users (termed "clients") via 
high speed internet connections. Cloud components and basic 
cloud computing models are illustrated in Figure 1: 



TYPES OF CLOUD 

Based on the availability of the datacenter and the related appli- 
cations to the clients, clouds are of the following three types (Rao 
and Rao, 2009): 

PUBLIC CLOUDS 

These are clouds owned and operated by third parties, aiming at 
individual client satisfaction by providing services at lower cost 
using a pay-as-you-go manner. An identical infrastructure pool 
is shared by all clients, operating with general constraints such as 
data security, and limited configuration to data and data variance. 
All the services are maintained by the cloud providers and may sat- 
isfy various needs as per demand. These services may be accessed 
from within the enterprise (by the user). These cloud services allow 
for a much greater size than may be possible within the enterprise 
using the cloud. Amazon's Elastic Compute Cloud (EC2), IBM's 
Blue Cloud, Sun Cloud, Google's App Engine, and Windows Azure 
Services are some examples of the few public clouds. 

PRIVATE CLOUDS 

These are clouds that are owned and operated by an enterprise 
solely for its own use. Data security and control are generally 
stronger than in public clouds. NASA's Nebula and Amazon's 
virtual private cloud (VPN) are private clouds. Private clouds are 
organized into the following two types: 

A. On Premise Private Cloud: Clouds falling in this category 
maintain clouds within the data center of the organization. 
This provides strong control on data and its flow, and thus best 
suited for private enterprises requiring high security. They are 
also known as internally hosted clouds. 




INTERNET 




FIGURE 1 |The basic model of cloud computing along with components of cloud. Unlike traditional beliefs internet is just the medium to provide services 
rather cloud [courtesy Torry Harris Business Solutions (THBS), copyright 2009 (www.thbs.com) with kind permission and slightly modified]. 
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B. Off Premise Private Cloud: This type of cloud shares data cen- 
ters from different enterprises to form clouds. The security 
level may be a little less stringent, due to the fact that data 
centers are shared. This is best suited for enterprises who are 
not interested in sharing physical storage, but that wish not to 
compromise on security level of data. They are also known as 
externally hosted clouds. 

HYBRID CLOUDS 

These are combinations of both public and private clouds. The 
private cloud providers can use a third-party provider, either in 
partial or full manner, and provide the service to its enterprise. 
The augmentation of private and public clouds via hybrid clouds 
significantly reduces workloads. 

MODELS OF CLOUD COMPUTING 

The services provided under cloud computing can be grouped into 
the following three categories: 

1. Software as a Service (SaaS): As this name suggests, soft- 
ware (the complete application) is served on demand to 
the clients. Multiple users are serviced using the sin- 
gle instance of software, without investing in servers and 
licenses. Only the provider has to pay, and, as a single 
instance, the running cost is much lower than on multiple 
servers. 

2. Platform as a Service (PaaS): Here a working platform is 
provided as a service by encapsulating the required software 
and the working environment to the provider, with this plat- 
form then being used by the clients. Many platforms fall 
into this category, for example, Restricted Java 2 Enterprise 
Edition (J2EE), RUBY, Linux, Apache, MySql, PHP (LAMP), 
with APPERENDA to fulfill manageability and scalability 
requirements. 

3. Infrastructure as a Service (Iaas): This service pro- 
vides computing capabilities and basic storage over 
the network. Here networking equipments, data cen- 
ter space and servers are provided as standardized ser- 
vice. Most common are Amazon, Joyent, GoGrid, and 
Skytap. 



SERVICE PROVIDERS 

The cloud service is offered and maintained internationally and 
nationally by various companies. The clients access computational 
resources over the internet via vendors (for example Amazon EC2) 
in the form of some performance platform such as the Galaxy 
cloud (Afgan et al., 2011). International cloud providers (Table 1) 
specialize in different use-cases and hence offer specific services. 

APPLICATIONS IN BIOLOGICAL SYSTEM 

The application of bioinformatics tools and computational biol- 
ogy to results obtained from wet lab experiments provides a means 
to refine these results and their predictions. It is therefore of 
paramount importance in research and developmental studies to 
achieve reduced time consumption in computational activities, 
despite exponentially growing datasets. The application of cloud 
computing has provided a means to address this problem, and 
applications have been already described in neurosciences (Wat- 
son et al., 2008), biomedical informatics (Rosenthal et al., 2010), 
and bioimage informatics (Peng, 2008). In this section, we briefly 
discuss case-study uses of cloud computing in the following sub- 
fields of genome analysis: SNP detection, comparative genomics, 
genome informatics, and metagenomics. 

GENOME ANALYSIS AND SNP DETECTION 

In the wet lab, SNP detection and validation typically requires large 
numbers of PCR runs occupying many months of time. For exam- 
ple, for analysis of a mere 48 contigs of bread wheat, 1260 PCR runs 
needs to be performed just for SNP detection (Rustgi et al, 2009). 
The validation and assembly then required resequencing, which 
occupied even more time. The above process if implemented on 
cloud can result to provide following advantages. 

Firstly, using bioinformatic approach, a single conventional 
computer required weeks of time to analyze a deep coverage 
human resequencing project and annotate the whole-genome 
(Rust et al., 2002). The cloud application (Hadoop, implement- 
ing MapReduce) whereas solved the same analysis problem in less 
than 3 h without compromising the accuracy rate (Schadt et al, 
2010; Schatzetal.,2010). 

Secondly, this particular cloud implementation used an effi- 
cient whole-genome genotyping tool, Crossbow, combining two 



Table 1 | Name and address of the international cloud service provider, types of clouds, and their web link. 



SI. No 


Company 


Address 


Cloud offering 


Web link 


1 


Amazon 


Street address: 1200 12th Avenue south, Seattle, WA, USA 


Elastic compute cloud 


http://aws.amazon.com/ec2/ 


2 


Bluelock 


5303 Lakeview Parkway South Drive Indianapolis, IN, USA 


Vcloud 


http://www.bluelock.com/ 


3 


CSC 


3170, Fairview, Park drive, Falls church, VA, USA 


Compute cloud, cloud lab 


http://www.csc.com/cloud 


4 


Google 


Daniel Ferguson, Miami, USA 


App engine 


http://www.googlecloudcomputing.net/ 


5 


IBM 


IBM corporation, White plains, NY, USA 


Blue cloud, sun cloud 


http://www.ibm.com/cloud-computing/ 










us/en/ 


6 


Joyent 


Joyent, 345 San francisco, CA, USA 


Smart machine 


http://www.joyentcloud.com/ 


7 


Microsoft 


Microsoft Corporation, Redmond, WA, USA 


Azure 


http://www.microsoft.com/en-in/ 










servercloud/readynow/default.aspx 


8 


Rackspace 


Rackspace, 5000Walzem,TX, USA 


Openstack 


http://www. rackspace.com/ 


9 


Salesforce 


The Landmark @ One Market suite, San francisco, CA, USA 


Salescloud 


http://www.salesforce.com/ 










cloudcomputing/ 


10 


THBS 


Torry Harris business solutions, 536 Fayette, USA 


N/A 


http://www.thbs.com/ 
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software tools (Langmead et al, 2009), the sequence aligner 
"Bowtie" and the SNP caller "SOAPsnp." This combination allowed 
rapid analysis of large sets of DNA sequences whilst maintaining an 
accuracy >98.9% with simulated datasets of individual chromo- 
somes and >99.8% concordance with the Illumina 1 M BeadChip 
assay of a sequenced individual. This implementation also did not 
require extensive software engineering for parallel computation. 
The input file system was distributed over several nodes in cloud 
environment (MapReduce), with Bowtie being called for align- 
ment, followed by SOAPsnp for SNP detection on every allotted 
server (Figure 2). 

Cloud computing realized by MapReduce and Hadoop can be 
leveraged to efficiently parallelize existing serial implementations 
of sequence alignment and genotyping algorithms. This combina- 
tion allows large datasets of DNA sequences to be analyzed rapidly 
without sacrificing accuracy or requiring extensive software engi- 
neering efforts to parallelize the computation (Langmead et al., 
2009). Another cloud based pipeline CloudMap greatly simplifies 
the analysis of mutant genome sequences. It is available on the 
Galaxy web platform and it requires no software installation when 
run on the cloud, but it can also be run locally or via Amazon's 
EC2 service (Minevich et al, 2012). 

COMPARATIVE GENOMICS 

Comparative genomics is the study of the relationship of genome 
structure and function across different biological species or strains. 
Large comparative genomics studies and tools are becoming 
increasingly more compute-expensive as the number of available 
genome sequences continues to rise. The capacity and cost of 
local computing infrastructures are likely to become prohibitive 
with the increase, especially as the breadth of questions con- 
tinues to rise. Alternative computing architectures, in particular 
cloud computing environments, may help alleviate this increasing 
pressure and enable fast, large-scale, and cost-effective compara- 
tive genomics strategies going forward. Cloud computing services 
have emerged as a cost-effective alternative for cluster systems 
as the number of genomes and required computation power to 
analyze them increased in recent years (Wall et al, 2010). The 
tsunami of DNA data brought by the next-generation and third 
generation sequencing techniques is demanding computationally 
intense application for simulation and processing of data which 
cannot be brought by the traditional bioinformatic tool analysis. 
One of the intensive computation demands is Reciprocal Shortest 
Distance (RSD) algorithm for comparative genomics which fur- 
ther increases with the increase in genome size to be analyzed. 
Wheat, having a very large genome as compared to rice, corn, and 
even human, having >80% repetitive DNA and tangled ancestry 
(derived from three different grass families) makes it unsuitable for 
many research purposes. But cloud is a subtle way of resolving this 
problem without worrying about the genome size and type. Basi- 
cally, RSD uses three bioinformatics application for comparison 
purpose (Kudtarkar et al., 2010; Wall et al, 2010). They are: 

1. BLAST: Protein sequence from one genome is compared with 
the other whole-genomes available. The list of hits above a 
threshold is selected. 



2. CLUSTALW: The hits are individually aligned with the original 
query sequence. Again a set of hits above threshold is selected. 

3. Codeml: A program from PAML is used for shortest distance 
calculation by maximum likelihood estimation of amino acid 
substitution. 

Then using phylogenetic analysis sequence having shortest dis- 
tance is retained and it is checked by reciprocal blast against 
genome containing the query sequence. The shortest distance is 
calculated from hits for original sequence. This is iterated several 
times for complete studies. 

This process involves a long toolchain with several formats for 
interchange between several tools often also requiring changes in 
between the output of one tool and the input of the next. Setting 
up the long tool chain with proper configuration in each system 
becomes a very tedious process in the traditional approaches of 
grid computing or distributed computing. In cloud computing, 
cloud providers provide a simple way to duplicate a cloud system. 
This way we only need to setup a single instance. This instance 
can be distributed on multiple servers by duplicating them auto- 
matically, thereby making the process easy to control as only one 
instance had to be setup. 

GENOME INFORMATICS 

Under the traditional flow of genome information, data centers 
used the internet as the pipeline for raw and simulated sequence 
information. The internet also provided users with direct or indi- 
rect access (i.e., through third-party webpages). Problems arose for 
power users, who needed to maintain their own computation and 
storage clusters and also preserve local copies of sequence datasets. 
Similar problems affected organizations maintaining websites and 
value added integrators. 

Nowadays, one can establish an account with Amazon Web 
Services (the earliest service provider to realize a practical cloud 
computing environment) or one of the other commercial vendors, 
launch a virtual machine instance from a wide variety of generic 
and bioinformatics-oriented images and attach any one of sev- 
eral large public genome-oriented datasets. For virtual machine 
images, one can choose images pre-populated with Galaxy: a pow- 
erful web-based system for performing many common genome 
analysis tasks, Bioconductor: a programming environment that 
is integrated with the R statistics package, GBrowse: a genome 
browser, BioPerl: a comprehensive set of bioinformatics modules 
written in the Perl programming language, JCVI Cloud BioLinux: 
a collection of bioinformatics tools including the Celera Assem- 
bler, and a variety of others. Several images that run specialized 
instances of the UCSC Genome Browser are under development. 
The biggest obstacle to moving to the cloud may well be net- 
work bandwidth. A typical research institution will have network 
bandwidth of about a gigabit/second (roughly 125MBps). Cloud 
computing is an attractive technology at this critical juncture. 
This way now power users can create on demand virtual com- 
pute clusters which have a direct access to the datasets and need 
not to download and save local copies to their system (Stein, 
2010). 

Another such platform is the Windows-based cloud comput- 
ing platform, Microsoft Azure, which is not yet as frequently 
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FIGURE 2 | The workflow for the detection of SNPs using cloud computing. Bowtie and SOAPsnp were used in this workflow (adopted from Langmead 
et al., 2009 with kind permission and slightly modified). 



used by the research community as Amazon Web Services. If 
users aim to transfer their Windows-based applications such as 
those coded in ASP.NET into a cloud space, Azure can be one 
of the best candidates with the least modification of the current 
implementation (Kim et al., 2012). 



METAGENOMICS 

Metagenomics generally refers to the creation of general catalogs 
of the genomic constituents of organisms in natural or engineered 
environments, typically based on whole-genome sequencing of 
DNA extracted from highly mixed populations. Metagenomics is 
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a relatively new technique that allows the analysis of DNA sam- 
ples taken from a variety of environments: marine, terrestrial, and 
so forth. Meta Genome Rapid Annotation using Subsystem Tech- 
nology (MG-RAST; Meyer et al, 2008) is currently the leading 
metagenomics analysis facility. It is growing quickly, with 700 new 
datasets added between January and April 2009 alone. Many of 
these datasets stem from previous-generation DNA sequencing 
technology and contain on average only 100 Mbp of data. 

Meta Genome Rapid Annotation using Subsystem Technology 
has a simple workflow; first, the data in fasta format is chunked into 
smaller pieces, then each chunk using BLAST is searched for sim- 
ilarities within the database (Wilkening et al., 2009). This BLAST 
part is heavily computation demanding and thus parallel compu- 
tation is needed along with some hardware development. This can 
be done faster with cloud computing. The developed hardware 
and required environment is provided by provider and the com- 
putation is done in parallel over various distributed servers to save 
time and cost. But the scenario changed here and from the user 
point of view, cloud computing was costlier than native approach 
(Qiu etal.,2010). 

Estimating metagenomic taxonomic content constitutes a key 
problem in metagenomic sequencing data analysis and extract- 
ing such content from high-throughput data of next-generation 
sequencing is very time-consuming. CloudLCA is a parallel LCA 
algorithm that significantly improves the efficiency of determining 
taxonomic composition in metagenomic data analysis. In compar- 
ison with MEGAN, a well-known metagenome analyzer, the speed 
of CloudLCA is up to five more times faster, and its peak mem- 
ory usage is approximately 18.5% that of MEGAN, running on 
a fat node. CloudLCA can be run on one multiprocessor node 
or a cluster. CloudLCA is a universal solution for finding the 
lowest common ancestor, and it can be applied in other fields 
requiring an LCA algorithm (Zhao et al., 2012). Another such 
tool is the CloVR-metagenomics pipeline which employs several 
well-known tools and protocols for the analysis of metagenomic 
whole-genome shotgun (WGS) sequence datasets. 

ADVANTAGES OF CLOUD COMPUTING 

1. Elasticity: Cloud computing maintains exceptional granularity 
on usage of services as a function of time. By this we mean 
it not only identifies specific combinations of applications for 
use with an individual service, but it also actively releases these 
applications when not in use by the individual. This greatly 
increases efficiencies, which results in reduced costs to the end- 
user. Similar improvements in efficiencies are associated with 
implementing updates to these individual applications. 

2. Cost-effectiveness: From the users' point of view, cost- 
effectiveness can be examined under the following sub- 
headings: 

(a) Hardware acquisition, installation, and maintenance: pay- 
ing for virtual hardware on demand is much cheaper than 
buying and maintaining equipment that is constantly being 
improved in performance. In this respect, various hid- 
den costs need to be considered, including amortization 
of capital associated with equipment acquisition and also 
installation (for example, dedicated clean environments, 
and air conditioning). 



(b) Software Costs: Generally, software licenses are costlier 
when invested in computers at every node, rather than 
being installed on service provider and serving node 
computers virtually. 

(c) Server Costs: Although server hardware rates are going 
down rapidly since on demand virtual hiring is much 
cheaper. This is because of the fact that in virtual hiring, 
one pays a very low rate for the compute capacity one actu- 
ally consumes. The following three instances describe the 
advantages of virtual hiring in much detail as incorporated 
in Amazon EC2: 

• On-demand instances - On-demand instances lets you 
pay for compute capacity by the hour with no long- 
term commitments. This frees you from the costs and 
complexities of planning, purchasing, and maintaining 
hardware and transforms what are commonly large fixed 
costs into much smaller variable costs. 

• Reserved Instances - Reserved Instances give you the 
option to make a low, one-time payment for each 
instance you want to reserve and in turn receive a sig- 
nificant discount on the hourly charge for that instance. 
There are three Reserved Instance types (Light, Medium, 
and Heavy Utilization Reserved Instances) that enable 
you to balance the amount you pay upfront with your 
effective hourly price. 

• Spot Instances - Spot Instances allow customers to bid 
on unused capacity and run those instances for as long 
as their bid exceeds the current Spot Price. The Spot 
Price changes periodically based on supply and demand, 
and customers whose bids meet or exceed it gain access 
to the available Spot Instances. If you have flexibility 
in when your applications can run, Spot Instances can 
significantly lower your Amazon EC2 costs. 

3. Reliability and Security: In cloud computing, since data is 
stored virtually, typically between multiple physical locations, 
the data is more secured in terms of physical events [a user's 
system crash, environmental disruptions (earthquake, floods, 
etc.) and physical theft]. This happens to be more secure in 
regard to data loss since there are multiple backups in different 
locations. 

CHALLENGES OF CLOUD COMPUTING 

Despite the benefits outlined above, cloud computing faces many 
challenges which need to be overcome in order to fully exploit 
these benefits and to continue to improve its capabilities. Qian 
et al. (2009) have described some of these challenges: 

1. Data Security: Since data is maintained, served, and secured by 
the service providers, the end-user has to rely on the provider 
for data security. This is a very big issue which is hindering 
commercialization of the cloud application. For example there 
has been a case of private user data being stolen across Dropbox 
accounts due to lack of security measures by the provider. 

2. Data Recovery and Management Systems: In situations where 
any cloud provider becomes underserved due to any reason, the 
resultant damage could be severe, and cannot be rectified in sit- 
uations where data is lost. So more secured and safe recovery 
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systems are required that are completely reliable. There have 
been recent cases such as the situation with the take-down of 
sites like Megaupload.com due to legal copyright issues. In the 
case of megaupload lots of data belonging to users not affected 
by the copyright cases were still lost forever. A similar such sit- 
uation could affect our data stored on the cloud if the cloud 
provider was to be taken down. In order to recover from such a 
scenario one might consider replicating the results of the sim- 
ulation after being processed on the cloud across several cloud 
providers so as to provide a means to recover. 

3. Bioinformatics and Computational Biology Problems: Mul- 
tiple areas of biology continue to be introduced requir- 
ing computational and bioinformatics problem-solving tech- 
niques (Galbraith, 2011). A major problem in this field 
is epistemological (Mushegian, 2011): there is a misap- 
prehension that dry lab results are less accurate than 
wet lab results which has confined its use in research 
work. 

4. Metadata Management and Cloud Provenance: Data prove- 
nance focuses on data flows, data history, inputs, out- 
puts, and data transformations that occur in a cloud. 
This further complicates the management of metadata, 
since documents are no longer stored on local machines 
or file servers, but instead stored remotely at third-party 
data centers. The problem will be complicated by the 
fact that users can modify documents on their third- 
party data centers and transmit them directly without 
sending them via a security solution that automatically 
checks for metadata and/or scrubs it before documents are 
sent. 

Server solutions now include new metadata removal options 
that protect the entire organization by scrubbing email attach- 
ments transmitted via mobile devices and corporate web 
mail. The downside is that server solutions give end-users 
less control over their documents, but they may still be the 



preferred solution for environments where end-user control is 
not a priority. 

These are the major areas to be considered recently to bring 
the organization and users on the virtual environment. There are 
many technical obstacles for adoption, growth, policy, and business 
obstacles for shifting to cloud computing. 

CONCLUSION 

Cloud computing is a field which is combined deployment of 
many ideas, technologies from different subject areas. Thus it is 
overall implementation of applications to handle the real world 
problem, the tsunami of data. There is a lot more opportunities 
in the fields of research and development as it is a growing field 
and yet to be fully commercialized. The cloud computing in bio- 
logical system will change the scenario of the approach toward 
solution of biological problems with much faster data acquisition 
and analysis rates. It may be deployed for development of DNA 
identifiers based on genome sequences as technology advances. 
There have been a lot of applications of cloud computing in the last 
few years in the field of genomics and other biological research and 
development sector. New cloud computing tools and algorithms 
have been developed and successfully implemented to manage the 
huge data and to analyze them more efficiently in much lesser 
time. 
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